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Abstract 

We  consider  the  optimal  servicing  of  a  queue  with  sigmoid  server  performance.  The  sigmoid  server  performance  occurs  in  various 
domains  including  human  decision  making,  visual  perception,  human-machine  communication  and  advertising  response.  The  tasks 
arrive  at  a  given  rate  to  the  server.  Each  task  has  a  deadline  that  is  incorporated  as  a  latency  penalty.  We  investigate  the  trade-off 
between  the  reward  obtained  by  processing  the  current  task  and  the  penalty  incurred  due  to  the  tasks  waiting  in  the  queue.  We  study 
this  optimization  problem  in  a  Markov  decision  process  (MDP)  framework  and  show  that  the  MDP  formulation  is  equivalent  to  a 
certainty-equivalent  problem.  We  determine  the  receding  horizon  servicing  policy  for  the  queue  and  show  that  the  optimal  policy 
may  drop  some  tasks,  that  is,  may  not  process  a  task  at  all.  We  then  develop  an  adaptive  policy  that  incorporates  all  the  available 
information  about  the  current  tasks  and  show  that  the  adaptive  policy  improves  the  performance  significantly.  Finally,  we  present 
a  comparative  study  of  the  receding  horizon  policy  for  the  certainty-equivalent  problem  and  the  adaptive  policy.  We  also  suggest 
guidelines  for  the  design  of  such  queues. 
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1.  Introduction 

The  recent  national  robotic  initiative  [10]  underlines  innovative 
robotics  research  and  applications  emphasizing  the  realization 
of  co-robots  acting  in  direct  support  of  and  in  a  symbiotic  rela¬ 
tionship  with  human  partners.  Such  co-robots  will  facilitate  bet¬ 
ter  interaction  between  the  human  partner  and  the  automaton. 
In  complex  and  information  rich  environments,  one  of  the  key 
roles  for  these  co-robots  is  to  help  the  human  partner  efficiently 
focus  her  attention.  A  particular  example  of  such  a  setting  is 
the  surveillance  mission,  where  the  human  operator  monitors 
the  evidence  collected  by  the  autonomous  agents  [5,  7].  The 
excessive  amount  of  information  available  in  such  systems  of¬ 
ten  results  in  poor  decisions  by  the  human  operator  [23].  This 
emphasizes  the  need  for  the  development  of  a  support  system 
that  helps  the  human  operator  optimally  focus  her  attention. 

Recently,  there  has  been  significant  interest  in  understanding 
the  physics  of  human  decision  making  [4],  Several  mathe¬ 
matical  models  for  human  decision  making  have  been  pro¬ 
posed  [4,  15,  27].  These  models  suggest  that  the  correctness 
of  the  decision  of  a  human  operator  in  a  binary  decision  mak¬ 
ing  scenario  evolves  as  a  sigmoid  function  of  the  time-duration 
allocated  for  the  decision.  Thus,  the  probability  of  the  correct 
decision  by  a  human  operator  increases  slowly  for  small  time- 
duration  allocations  and  high  time-duration  allocations,  and  in¬ 
creases  quickly  for  moderate  time-duration  allocations.  The 
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sigmoid  function  also  models  the  quality  of  human-machine 
communication  [27],  the  human  performance  in  multiple  target 
search  [12],  the  advertising  response  function  [26],  and  the  ex¬ 
pected  profit  in  simultaneous  bidding  [17].  Therefore,  the  anal¬ 
ysis  presented  in  this  paper  can  also  be  used  to  determine  op¬ 
timal  human-machine  communication  policies,  optimal  search 
strategies,  the  optimal  advertisement  duration  allocation,  and 
optimal  bidding  strategies.  In  this  paper,  we  genetically  refer  to 
the  server  with  sigmoid  performance  as  a  human  operator  and 
the  tasks  as  the  decision  making  tasks.  When  a  human  opera¬ 
tor  has  to  serve  a  queue  of  decision  making  tasks  in  real  time, 
the  tasks  (e.g.,  feeds  from  camera)  waiting  in  the  queue  lose 
value  continuously.  This  trade-off  between  the  correctness  of 
the  decision  and  the  loss  in  the  value  of  the  pending  tasks  is  of 
critical  importance  for  the  performance  of  the  human  operator. 
In  this  paper,  we  address  this  trade-off,  and  determine  the  opti¬ 
mal  duration  allocation  policies  for  the  human  operator  serving 
a  decision  making  queue. 

There  has  been  significant  interest  in  the  study  of  the  perfor¬ 
mance  of  a  human  operator  serving  a  queue.  In  an  early  work, 
Schmidt  [21]  models  the  human  as  a  server  and  numerically 
studies  a  queueing  model  to  determine  the  performance  of  a 
human  air  traffic  controller.  Recently,  Savla  et  al  [20]  study 
human  supervisory  control  for  unmanned  aerial  vehicle  oper¬ 
ations:  they  model  the  system  by  a  simple  queuing  network 
with  two  components  in  series,  the  first  of  which  is  a  spatial 
queue  with  vehicles  as  servers  and  the  second  is  a  conventional 
queue  with  human  operators  as  servers.  They  design  joint  mo¬ 
tion  coordination  and  operator  scheduling  policies  that  mini¬ 
mize  the  expected  time  needed  to  classify  a  target  after  its  ap- 
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pearance.  The  performance  of  the  human  operator  based  on  her 
utilization  history  has  been  incorporated  to  design  maximally 
stabilizing  task  release  policies  for  a  human-in-the-loop  queue 
in  [19,  18].  Bertuccelli  et  al  [3]  study  the  human  supervisory 
control  as  a  queue  with  re-look  tasks.  They  study  the  policies 
in  which  the  operator  can  put  the  tasks  in  an  orbiting  queue  for 
a  re-look  later.  An  optimal  scheduling  problem  in  the  human 
supervisory  control  in  studied  in  [2].  The  authors  determine  a 
sequence  in  which  the  tasks  should  be  serviced  so  that  the  ac¬ 
cumulated  reward  is  maximized.  Powel  et  al  [16]  model  mixed 
team  of  humans  and  robots  as  a  multi-server  queue  and  incor¬ 
porate  a  human  fatigue  model  to  determine  the  performance  of 
the  team.  They  present  a  comparative  study  of  the  fixed  and 
rolling  work-shifts  of  the  operators. 

The  optimal  control  of  queueing  systems  [22]  is  a  classical 
problem  in  queueing  theory.  Stidham  et  al  [13]  study  the  op¬ 
timal  service  policies  for  a  M/G/l  queue.  They  formulate  a 
semi-Markov  decision  process,  and  describe  the  qualitative  fea¬ 
tures  of  the  solution.  Certain  technical  assumptions  in  [13]  are 
relaxed  by  George  et  al  [8].  In  contrast  to  the  models  discussed 
here,  these  studies  assume  identical  tasks  and  submodular  per¬ 
formance  functions.  Hernandez-Lerma  et  al  [11]  determine  op¬ 
timal  servicing  policies  for  the  identical  tasks  and  some  arrival 
rate.  They  adapt  the  optimal  policy  as  the  arrival  rate  is  learned. 

In  this  paper,  we  study  the  problem  of  optimal  time-duration 
allocation  in  a  queue  of  binary  decision  making  tasks  with  a 
human  operator.  We  refer  to  such  queues  as  decision  making 
queues.  We  assume  that  tasks  come  with  processing  deadlines 
and  incorporate  these  deadlines  as  a  soft  constraint,  namely,  la¬ 
tency  penalty.  We  consider  two  particular  problems.  First,  we 
consider  a  static  queue  with  latency  penalty.  Here,  the  human 
operator  has  to  serve  a  given  number  of  tasks.  The  operator  in¬ 
curs  a  penalty  due  to  the  delay  in  processing  of  each  task.  This 
penalty  can  be  thought  of  as  the  loss  in  value  of  the  task  over 
time.  Second,  we  consider  a  dynamic  queue  of  the  decision 
making  tasks.  The  tasks  arrive  at  a  fixed  rate  and  the  operator 
incurs  a  penalty  for  the  delay  in  processing  each  task.  In  both 
the  problems,  there  is  a  trade-off  between  the  reward  obtained 
by  processing  a  task,  and  the  penalty  incurred  due  to  the  result¬ 
ing  delay  in  processing  other  tasks.  We  address  this  particular 
trade-off. 

The  major  contributions  of  this  work  are  as  follows:  (i)  we  de¬ 
termine  the  optimal  duration  allocation  policy  for  the  static  de¬ 
cision  making  queue  with  latency  penalty;  (ii)  we  pose  an  MDP 
to  determine  the  optimal  allocations  for  the  dynamic  decision 
making  queue  and  show  that  the  MDP  formulation  is  equivalent 
to  a  certainty-equivalent  problem;  (iii)  we  provide  a  simple  pro¬ 
cedure  to  determine  a  receding  horizon  policy  for  the  certainty- 
equivalent  problem,  namely,  certainty-equivalent  policy;  (iv) 
we  establish  performance  bounds  for  the  certainty-equivalent 
policy;  (v)  we  study  an  adaptive  algorithm  that  incorporates 
all  the  available  information  about  the  current  tasks  and  im¬ 
proves  the  performance  of  the  certainty-equivalent  policy;  (vi) 
we  present  a  comparative  study  of  the  certainty-equivalent  pol¬ 
icy  and  the  adaptive  policy;  (vii)  we  suggest  some  guidelines 


for  the  design  of  decision  making  queues. 

The  remainder  of  the  paper  is  organized  as  follows.  We  discuss 
some  preliminary  concepts  in  Section  2.  We  present  the  prob¬ 
lem  setup  in  Section  3.  The  static  queue  with  latency  penalty  is 
considered  in  Section  4.  We  pose  the  optimization  problems  as¬ 
sociated  with  the  dynamic  queue  with  latency  penalty  and  study 
their  properties  in  Section  5.  We  present  and  analyze  reced¬ 
ing  horizon  algorithms  for  these  optimization  problems  in  Sec¬ 
tion  6.  A  real  time  adaptive  algorithm  is  studied  in  Section  7. 
Our  conclusions  are  presented  in  Section  8. 


2.  Preliminaries 

In  this  section,  we  present  some  concepts  that  are  used  through¬ 
out  the  paper.  We  start  with  some  models  of  human  decision 
making,  followed  by  some  properties  of  sigmoid  functions.  We 
close  the  section  with  a  discussion  on  receding  horizon  opti¬ 
mization. 

2.1.  Speed-accuracy  trade-off  in  human  decision  making 

Consider  the  scenario  where,  based  on  the  collected  evidence, 
the  human  has  to  decide  on  one  of  the  two  alternatives  Hq  and 
H i.  The  evolution  of  the  probability  of  correct  decision  has 
been  studied  in  cognitive  psychology  literature  [15,  4]. 

Pew’s  model:  The  probability  of  deciding  on  hypothesis  Hi , 
given  that  hypothesis  H\  is  true,  at  a  given  time  t  e  R>o  is 
given  by 

P(say//1|//1,0=  1+g*V»), 

where  po  e  [0, 1],  a,f)6l  are  some  parameters  specific 
to  the  human  operator  [15]. 

Drift  diffusion  model:  Conditioned  on  the  hypothesis  Hi,  the 
evolution  of  the  evidence  for  decision  making  is  modeled 
as  a  drift-diffusion  process  [4],  that  is,  for  a  given  a  drift 
rate  /?  e  R>o,  and  a  diffusion  rate  cr  e  R>o,  the  evidence  A 
at  time  t  is  normally  distributed  with  mean  (it  and  variance 
cr~t.  The  decision  is  made  in  favor  of  H\  if  the  evidence  is 
greater  than  a  decision  threshold  q  6  R>o-  Therefore,  the 
conditional  probability  of  the  correct  decision  at  time  t  is 

l  r+a°  — (A— 

P(say  H\  | Hi ,  t)  =  — — =  I  e  dA. 
\2mfft  Jn 


(a)  Pew’s  model  (b)  Drift  diffusion  model 


Figure  1 :  The  evolution  of  the  probability  of  the  correct  decision  under  Pew’s 
and  drift  diffusion  model.  Both  curves  look  similar  and  are  sigmoid. 
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2.2.  Sigmoid  functions 

A  doubly  differentiable  function  /  :  R>o  — >  R>o  defined  by 

fit)  =  fcn(t)I(t  <  finf)  +  fcm(t)I(t  >  tM), 

is  called  a  sigmoid  function,  where  /cvx  and  fcm  are  monoton- 
ically  increasing  convex  and  concave  functions,  respectively. 
If)  is  the  indicator  function  and  tmt  e  R>o  is  the  inflection 
point.  The  derivative  of  a  sigmoid  function  is  a  unimodal  func¬ 
tion  that  achieves  its  maximum  at  flnf.  Further,  /'( 0)  >  0  and 
limf_+00  f'(fi)  =  0.  Also,  limf^+0O  /"(f)  =  0.  A  typical  graph  of 
the  first  and  second  derivative  of  a  sigmoid  function  is  shown  in 
Figure  2.  From  the  derivative  of  the  sigmoid  function,  it  is  clear 
that  the  sigmoid  functions  are  not  submodular.  Note  that  the 
evolution  of  the  conditional  probability  of  the  correct  decision 
is  a  sigmoid  function  in  Pew’s  as  well  as  drift-diffusion  model. 


Figure  2:  (a)  First  derivative  of  the  sigmoid  function  and  the  penalty  rate.  A 
particular  value  of  the  derivative  may  be  attained  at  two  different  times.  The 
total  benefit,  that  is,  the  sigmoid  reward  minus  the  latency  penalty,  decreases 
up  to  tmm,  increases  from  fmjn  to  tmax,  and  then  decreases  again,  (b)  Second 
derivative  of  the  sigmoid  function.  A  particular  positive  value  of  the  second 
derivative  may  be  attained  at  two  different  times. 


2.3.  Receding  horizon  optimization 

Consider  the  following  infinite  horizon  dynamic  optimization 
problem: 

+oo 

maximize  >  iL/(x(€),  u(€)) 

tli  (!) 

subject  to  x((  +  1)  =  <p(x({),  u(£)),  x(0)  given, 

where  x(t),  u(C)  e  R  are  the  state  and  the  control  input  at  time 
{  e  N,  respectively,  /  :  R  x  R  — >  R  is  the  stage  cost,  and 
(f>  :  R  x  R  — >  R  defines  the  nonlinear  evolution  of  the  system. 

In  receding  horizon  optimization  [6],  the  optimization  prob¬ 
lem  (1)  is  approximated  by  the  following  finite  horizon  opti¬ 
mization  problem  at  each  stage  0  6  N: 

0+N-l 

maximize  )  Mx(€),u(£)) 

te  (2) 

subject  to  x(f  +  1)  =  <p{x{£),  u(f)),  x(0)  given, 

where  N  e  N  is  a  finite  horizon  length.  The  receding  horizon 
optimization  is  summarized  in  Algorithm  1 . 


Algorithm  1  Receding  horizon  optimization 
1:  at  stage  0  e  N,  observe  state  x{0) 

2:  Solve  optimal  control  problem  (2)  and  compute  the  optimal 
control  inputs  u*(0), . . . ,  u*(0  +  N  -  1) 

3:  Apply  u*(0),  and  set  0  =  0  +  1 
4:  Go  to  step  1 : 


3.  Problem  setup 

We  consider  the  problem  of  optimal  time  duration  allocation  for 
a  human  operator.  The  decision  making  tasks  arrive  at  a  given 
rate  and  are  stacked  in  a  queue.  A  human  operator  processes 
these  tasks  on  the  first-come  first-serve  basis  (see  Figure  3.)  The 
human  operator  receives  a  unit  reward  for  the  correct  decision, 
while  there  is  no  penalty  for  a  wrong  decision.  We  assume  that 
the  tasks  can  be  classified  according  to  their  difficulty,  and  the 
difficulty  level  takes  value  in  an  arbitrary  set  D  c  R/  for  some 
q  6  N.  For  a  decision  made  after  processing  a  task  with  diffi¬ 
culty  d  e  D  for  time  t,  the  expected  reward  is 

E[lsayff,|ffi,r]  =  P(say  Hi\Hut)  =  fd(t),  (3) 

where  fid  :  R>o  — »  ]0, 1  [  is  the  sigmoid  function  associated  with 
the  task.  Note  that  such  reward  structure  corresponds  to  the 
expected  number  of  correct  decisions. 

We  consider  two  particular  problems.  First,  in  Section  4,  we 
consider  a  static  queue  with  latency  penalty,  that  is,  the  sce¬ 
nario  where  the  human  operator  has  to  perform  N  e  N  decision 
making  tasks,  but  each  task  loses  value  at  a  constant  rate  per 
unit  delay  in  its  processing.  Second,  in  Sections  5,  6,  7,  we 
consider  a  dynamic  queue  of  decision  making  tasks  where  each 
task  loses  value  at  a  constant  rate  per  unit  delay  in  its  process¬ 
ing.  The  loss  in  the  value  of  a  task  may  occur  due  to  the  pro¬ 
cessing  deadline  on  the  task.  In  other  words,  the  latency  penalty 
is  a  soft  constraint  that  captures  the  processing  deadline  on  the 
task.  For  such  a  decision  making  queue,  we  are  interested  in 
the  optimal  time-duration  allocation  to  each  task.  Alternatively, 
we  are  interested  in  the  task  release  rate  that  will  result  in  the 
desired  accuracy  for  each  task.  We  intend  to  design  a  decision 
support  system  that  tells  the  human  operator  the  optimal  time- 
duration  allocation  to  each  task. 

Remark  1  (Soft  constraints  versus  hard  constraints).  The  pro¬ 
cessing  deadlines  on  the  tasks  can  be  incorporated  as  hard  con¬ 
straints  as  well,  but  the  resulting  optimization  problem  is  com- 
binatorially  hard.  For  instance,  if  the  performance  of  the  hu¬ 
man  operator  is  modeled  by  a  step  function  with  the  jump  at  the 
inflection  point  and  the  deadlines  are  incorporated  as  hard  con¬ 
straints,  then  the  resulting  optimization  problem  is  equivalent  to 
the  /V-dimensional  knapsack  problem  [14].  The  /V-dimensional 
knapsack  problem  is  N R-hard  and  admits  no  fully  polynomial 
time  approximation  algorithm  for  N  >2.  The  standard  [14]  ap¬ 
proximation  algorithm  for  this  problem  has  factor  of  optimality 
N  +  1  and  hence,  for  large  N,  may  yield  results  very  far  from 
the  optimal.  The  close  connections  between  the  knapsack  prob¬ 
lems  with  step  functions  and  sigmoid  functions  (see  [24])  sug¬ 
gest  that  efficient  approximation  algorithms  may  not  exist  for 
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the  problem  formulation  where  processing  deadlines  are  mod¬ 
eled  as  hard  constraints.  □ 


operator 


Figure  3:  Problem  setup.  The  decision  making  tasks  arrive  at  a  rate  A.  These 
tasks  are  served  by  a  human  operator  with  sigmoid  performance.  Each  task 
loses  value  while  waiting  in  the  queue. 


4.  Static  queue  with  latency  penalty 

4.1.  Problem  description 

Consider  that  the  human  operator  has  to  perform  ff  eN  deci¬ 
sion  making  tasks  in  a  prescribed  order  (task  labeled  ”1”  should 
be  processed  first,  etc.)  Let  the  human  operator  allocate  dura¬ 
tion  t(  to  the  task  £  e  {1, . . .  ,N).  Let  the  difficulty  of  the  task  £ 
be  d{  G  D.  According  to  the  importance  of  the  task,  a  weight 
W(  6  R>o  is  assigned  to  the  task  £.  The  operator  receives  an  ex¬ 
pected  reward  wcfdfU)  for  allocating  duration  tf  to  the  task  £, 
while  she  incurs  a  latency  penalty  C{  per  unit  time  for  the  delay 
in  its  processing.  Therefore,  the  expected  benefit  for  task  £  is 
WfJfXtf)  -  C({t\  + . . .  + 1().  The  objective  of  the  human  operator  is 
to  maximize  her  average  benefit  and  the  associated  optimization 
problem  is: 

1  N 

maximize  —  V  ( w(fd[(t{ )  -  (ce  +  ■  ■  ■  +  cN)tc),  (4) 

'eH>0  N 

where  t  =  {tj , . . . ,  t^}  is  the  duration  allocation  vector. 


4.2.  Optimal  solution 


We  start  by  establishing  some  properties  of  sigmoid  functions. 
We  study  the  optimization  problem  involving  a  sigmoid  reward 
function  and  a  linear  latency  penalty.  In  particular,  given  a  sig¬ 
moid  function  /  and  a  penalty  rate  c  G  R>o,  we  wish  to  solve 
the  following  problem: 


maximize  fit)  -  ct. 

feM>o 


(5) 


The  derivative  of  a  sigmoid  function  is  not  a  one-to-one  map¬ 
ping  and  hence,  not  invertible.  We  define  the  pseudo-inverse  of 
the  derivative  of  a  sigmoid  function  /  with  inflection  point  rlnf, 
/f  :  R>0  ->  K>o  by 


fmaxjr  G  R>0  [  fit)  =  y}, 

l0’ 


if  v  e  ]0,/'(finf)], 
otherwise. 


(6) 


Notice  that  the  definition  of  the  pseudo-inverse  is  consistent 
with  Figure  2(a). 


Lemma  1  (Sigmoid  function  and  linear  penalty).  For  the  opti¬ 
mization  problem  (5),  the  optimal  solution  t*  is 

t*  €  argma -  c/3  |  f  G  {0,/+(c)}}. 

Proof.  The  global  maximum  lies  at  the  point  where  first  deriva¬ 
tive  is  zero  or  at  the  boundary  of  the  feasible  set.  The  first 
derivative  of  the  objective  function  is  f'(t)  -  c.  If  /'(flnf)  <  c, 
then  the  objective  function  is  a  decreasing  function  of  time  and 
the  maximum  is  achieved  at  t*  =  0.  Otherwise,  a  critical  point 
is  obtained  by  setting  first  derivative  to  zero.  We  note  that 
f'(t)  =  c  has  at  most  two  roots.  The  second  derivative  con¬ 
dition  yields  that  if  there  exist  two  roots,  then  only  the  larger  of 
the  two  roots  corresponds  to  a  local  maximum.  Otherwise,  the 
only  root  corresponds  to  a  local  maximum.  The  global  maxi¬ 
mum  is  determined  by  comparing  the  local  maximum  with  the 
value  of  the  objective  function  at  the  boundary  t  —  0.  This 
completes  the  proof.  □ 

Definition  1  (Critical  penalty  rate).  For  a  given  sigmoid  func¬ 
tion  /  and  penalty  rate  c  G  R>o,  let  the  solution  of  the  prob¬ 
lem  (5)  be  tyc.  The  critical  penalty  rate  <Zf  is  defined  by 

gf  =  sup{c  G  R>0  |  ff  c  G  R>0}.  (7) 

Note  that  the  critical  penalty  rate  is  the  slope  of  the  tangent  to 
the  sigmoid  function  /  from  the  origin.  □ 

The  optimal  solution  to  problem  (5)  for  different  values  of 
penalty  rate  c  is  shown  in  Figure  4.  One  may  notice  the  op¬ 
timal  solution  jumps  down  to  zero  at  the  critical  penalty  rate. 
This  jump  in  the  optimal  allocation  gives  rise  to  combinatorial 
effects  in  the  problems  involving  multiple  sigmoid  functions. 


Figure  4:  Optimal  solution  to  the  problem  (5)  as  a  function  of  linear  penalty 
rate  c.  The  optimal  solution  f  — >  +oo  as  the  penalty  rate  c  — *  0+. 


We  can  now  analyze  optimization  problem  (4). 

Theorem  2  (Static  queue  with  latency  penalty).  For  the  op¬ 
timization  problem  (4),  the  optimal  allocation  to  task  £  G 
{1, . . . ,  N)  is 

f(  G  argmax  \wtfde{fi)  -  (c{  +  ■  ■  ■  +  cN)/3  \ 

P  6  {0>  /J,«A  +  ■  ■  ■  +  cn)!w())}. 


Proof.  The  proof  is  similar  to  the  proof  of  Lemma  1 .  □ 

Remark  2  (Comparison  with  a  concave  utility).  The  optimal 
duration  allocation  for  the  static  queue  with  latency  penalty  de¬ 
creases  to  a  critical  value  with  increasing  penalty  rate,  then 
jumps  down  to  zero.  In  contrast,  if  the  performance  function 
is  concave  instead  of  sigmoid,  then  the  optimal  duration  allo¬ 
cation  decreases  continuously  to  zero  with  increasing  penalty 
rate.  □ 
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Example  1  (Static  queue  and  homogeneous  tasks).  The  human 
operator  has  to  serve  N  =  10  tasks  and  receives  an  expected  re¬ 
ward  f(t)  =  1/(1  +  exp(5  -  t))  for  an  allocation  of  duration  t 
secs  to  a  task,  while  she  incurs  a  penalty  c  =  0.02  per  sec  for 
each  pending  task.  The  optimal  policy  according  to  Theorem  2 
is  shown  in  Figure  5(a).  The  optimal  policy  drops  some  tasks 
initially,  then  processes  the  remaining  tasks.  The  duration  allo¬ 
cation  increases  with  decreasing  number  of  pending  tasks.  □ 
Example  2  (Static  queue  and  heterogeneous  tasks).  The  hu¬ 
man  operator  has  to  serve  N  =  10  heterogeneous  tasks  and  re¬ 
ceives  an  expected  reward  fd,{t)  —  1/(1  +  exp (-aet  +  b(j)  for  an 
allocation  of  duration  t  secs  to  task  £,  where  df  is  characterized 
by  the  pair  (a^ ,  bf).  The  following  are  the  parameters  and  the 
weights  associated  with  each  task: 

(au...,aN)  =  (1,2, 1,3, 2, 4, 1,5, 3, 6), 

(bu...,bN)  =  (5,10,3,9,8,16,6,30,6,12),  and 
(wu...,wN)  =  (2,5,7,4,9,3,5,10,13,6). 

Let  the  vector  of  penalty  rates  be 

c  =  (0.09,0.21,0.21,0.06,0.03,0.15,0.3,0.09,0.18,0.06) 

per  second.  The  optimal  allocations  are  shown  in  Figure  5(b). 
The  importance  and  difficulty  level  of  a  task  are  encoded  in  the 
associated  weight  and  the  inflection  point  of  the  associated  sig¬ 
moid  function,  respectively.  The  optimal  allocations  depend  on 
the  difficulty  level,  the  penalty  rate,  and  the  importance  of  the 
tasks.  For  instance,  task  6  is  a  relatively  simple  but  less  impor¬ 
tant  task  and  is  dropped.  On  the  contrary,  task  8  is  a  relatively 
difficult  but  very  important  task  and  is  processed.  □ 
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Task  Task 

(a)  Homogeneous  tasks  (b)  Heterogeneous  tasks 

Figure  5:  Static  queue  with  latency  penalty.  For  homogeneous  tasks,  the  opti¬ 
mal  policy  drops  some  tasks  initially  and  then  processes  the  remaining  tasks. 
The  duration  allocation  increases  with  decreasing  queue  length.  For  heteroge¬ 
neous  tasks,  the  optimal  allocations  depends  of  the  difficulty  level,  the  penalty 
rate  and  the  importance  of  the  tasks. 


5.  Dynamic  queue  with  latency  penalty:  problem  descrip¬ 
tion  and  properties  of  optimal  solution 

In  the  previous  section,  we  developed  policies  for  static  queue 
with  latency  penalty.  We  now  consider  dynamic  queue  with  la¬ 
tency  penalty,  that  is,  the  scenario  where  the  tasks  arrive  accord¬ 
ing  to  a  stochastic  process  and  wait  in  a  queue  to  get  processed. 
We  assume  the  tasks  lose  value  while  waiting  in  the  queue.  The 
operator’s  objective  is  to  maximize  her  infinite  horizon  reward. 
In  the  following,  we  pose  the  problem  as  an  MDP  and  show 
that  the  infinite  horizon  average  value  formulation  of  the  MDP 
is  equivalent  to  a  deterministic  dynamic  optimization  problem. 


5.1.  Problem  description 

Assume  that  the  human  operator  has  to  serve  a  queue  of  deci¬ 
sion  making  tasks  arriving  according  to  Poisson  process  with 
rate  A  e  R>o.  We  assume  that  each  task  is  sampled  from  a  prob¬ 
ability  distribution  function  p  :  T>  — >  R>o,  where  D  c  R9  is 
the  set  of  difficulty  levels  of  the  tasks.  Each  task  is  assigned 
a  weight  based  on  its  importance.  Two  tasks  that  are  equally 
difficult  may  have  different  weights.  To  capture  this  feature, 
we  assume  that  the  weight  associated  with  a  task  with  difficulty 
level  d  is  a  random  variable  Wd  e  R>o  with  probability  distribu¬ 
tion  function  /?"  :  [w™n,  w“ax]  ->  R>0,  where  w™n,  w”ax  e  R>0 
are  given  constants.  Similarly,  let  the  latency  penalty  associated 
with  a  task  with  difficulty  level  d  be  a  random  variable  cd  e  K>o 
with  probability  distribution  function  pcd  :  [c™n,c™ax]  — >  R>0, 
where  c™n,c™ax  6  K>o  are  given  constants.  Let  the  realized 
difficulty  level,  importance,  and  latency  penalty  rate  for  task  £ 
be  d(,  Wd,,  and  cd( ,  respectively.  Thus,  the  operator  receives  an 
expected  reward  Wd,fd,(t{ )  for  a  duration  allocation  tf  to  task 
t,  while  she  incurs  a  latency  penalty  Cd,  per  unit  time  for  the 
delay  in  its  processing.  Note  that  while  designing  a  decision 
making  queue,  the  true  realizations  of  the  random  variables  are 
not  known  and  only  expected  values  are  at  designer’s  disposal. 
Therefore,  we  construct  the  value  function  with  the  expected 
values  over  realizations  of  the  queue.  We  define  the  expected 
reward  function  /  :  R>o  — »  ]0, 1  [  and  the  expected  penalty  rate 
c  e  R>o  by 

fit)  =  ^p[^p~[wd]fd(t)]  and  c  =  Ep[E^[q]], 

respectively,  where  w  =  Ep[Ep"[w</]]  and  E*[-]  represents  the 
expected  value  with  respect  to  the  measure  *.  Note  that  these 
expressions  assume  that  Wd,  d,  and  cd  are  statistically  indepen¬ 
dent. 

We  denote  the  queue  length  at  the  beginning  of  processing  task 
(  6  Nby«f  6  Z>(|.  The  objective  of  the  operator  is  to  maximize 
her  infinite  horizon  expected  reward.  For  any  allocation  t{  to 
task  £,  the  queue  length  evolves  according  to  a  Poisson  process 
and  hence,  it  is  Markovian.  We  now  formulate  the  optimization 
problem  as  an  MDP.  We  construct  such  an  MDP,  namely  F,  with 
the  action  space  as  the  set  of  durations  that  can  be  allocated  and 
the  state  space  as  the  difficulty  level  of  the  tasks  in  the  queue. 
We  define  the  reward  rt  :  2)  x  R"«  xlx  R>0  — >  R  obtained  by 
allocating  duration  t  to  the  task  f  by 

j  t+ne- 1  t+n'e-l 

rf{d(,  C,  W  d, ,  t)  =  Wdjdft)  -  -(  ^  cdi+  ^  cd.)f, 

i=e  j=e 

where  n’(  6  N  is  the  queue  length  just  before  the  end  of  process¬ 
ing  of  the  task  (eN  and  c  e  is  the  vector  of  penalty  rates 
for  the  tasks  in  the  queue.  Note  that  the  queue  length  while 
a  task  is  processed  may  not  be  constant,  therefore  the  latency 
penalty  is  computed  as  the  average  of  the  latency  penalty  for 
the  tasks  present  at  the  start  of  processing  the  task  and  the  la¬ 
tency  penalty  for  the  tasks  present  at  the  end  of  processing  the 
task.  Such  averaging  is  consistent  with  expected  number  of  ar¬ 
rivals  being  a  linear  function  of  time  for  Poisson  process. 
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For  a  duration  allocation  t(  to  task  £  e  N,  the  transition  proba¬ 
bility  from  queue  length  ri(  e  Z>o  to  queue  length  nf+\  e  Z>o 
is 

ff  _  |0,  if  ri(+i  e  2), 

nen,+1  i  otherwise. 

V  Ov+i->v+i)!  ’ 

The  MDP  with  finite  horizon  length  WeN  maximizes  the  value 
function  f^:Nx  R^0  — >  R  defined  by 

N 

VN(n\,t)  =  ^E[rc(d^c,wd£,t^\ni],  (8) 

e=\ 

where  t  is  the  vector  of  allocations  to  each  task  and  n\  is  the 
initial  queue  length. 

The  infinite  horizon  average  value  function  of  the  MDP,  de¬ 
noted  by  Vavg  :  N  x  R^0  — >  R,  is  defined  by 

VavgO H,t)=  lim  ^-rVN(ni,t). 

N—>+oo  TV 


We  study  the  MDP  F  under  the  following  assumptions: 
Assumption  1  (Non-empty  queue).  Without  loss  of  generality, 
we  assume  that  the  queue  is  never  empty.  If  queue  is  empty  at 
some  stage,  then  the  operator  waits  for  the  next  task  to  arrive, 
and  there  is  no  penalty  for  such  waiting  time. 

Assumption  2  (Sigmoid  average  performance).  We  assume 
the  average  of  the  sigmoid  functions  /  is  a  sigmoid  function. 
Remark  3  (Sigmoid  average  performance).  Assumption  2 
is  justified  in  several  contexts.  For  empirically  obtained  sig¬ 
moid  functions,  /  can  be  obtained  by  fitting  a  sigmoid  func¬ 
tion  through  averaged  empirical  data.  In  the  context  of  decision 
making  tasks,  the  performance  of  the  operator  on  each  task  is 
modeled  by  a  drift-diffusion  process,  and  the  average  of  a  set 
of  drift-diffusion  processes  is  again  a  drift-diffusion  process. 
Hence,  the  average  performance  is  well  modeled  by  a  sigmoid 
function.  □ 


5.2.  Properties  of  optimal  solution 


We  now  study  some  properties  of  the  MDP  F  and  its  solution 
that  will  be  used  later  in  the  paper.  Before  we  establish  these 
properties,  we  introduce  the  following  optimization  problem, 
which  we  refer  to  as  the  certainty-equivalent  [1]  problem: 


If,  -  cAtz( 

maximize  lim  —  )  [wf(t{)  -  cE[n{\nx]t{ - — 

jV-,+oo  Jy  z— i  \  2 

subject  to  E[«f+i|«i]  =  max{0,  E[«f|«i]  -  1  +  Atf] 
t(>  0,WeN. 


(9) 


We  also  define  Amax  =  \wgflc\.  We  will  show  that  Amax  is  the 
maximum  queue  length  at  which  the  optimal  policy  allocates 
non-zero  duration  to  the  first  task.  We  now  state  some  proper¬ 
ties  of  the  MDP  F : 

Lemma  3  (Properties  of  MDP  F).  Under  Assumption  1  and  2, 
the  following  statements  hold  for  the  MDP  F  and  its  infinite 
horizon  average  value  function: 


(i) .  the  MDP  F  admits  the  same  optimal  policy  as  (9); 

(ii) .  the  optimal  policy  allocates  zero  duration  to  the  first  task 

if  n  ]  >  Amax, 

(iii) .  the  optimal  policy  allocates  a  duration  less  than  ffc/w) 

to  each  task. 


Proof.  We  start  with  the  definition  of  Vavg: 

1  N 

Favg(Hi,D=  lim  - ;V 'E[re(de,n{,n'{,t{)\ni] 

6  iv-»+ co  N  t—1 

(=  l 

1  N  i  e+ni-1  l+n'f- 1 

=  n  ^  Zc'+Zc')+'l 

e=\  i=t  j=t 

1  N  l 

=  lim  -  V  wf(te)  -  -cE[«f  +  n't\n]]t(  (10) 

N—>+oo  Yv  L — *  Z 

e=\ 

,  n  j 

=  lim  —  Ywm  -  -c(2E[(uf|«i]  +  Ate)te 

N-^+oo  YV  y — ^  L 

1  N  j 

=  lim  Ad  Y.*m  ~  cMneMti  -  - cAt ], 
v->+ oo  N  z— i  2 

where  equation  (10)  follows  from  the  Wald’s  identity  [9]  and 
the  expected  evolution  of  the  queue  length  is  determined  by  the 
Poisson  arrival  and  the  deterministic  service  processes,  that  is, 

E[n^+i|ui]  =  max{0,E[nf|«i]  -  1  +  At( },  W  e  {1, . . . , N). 


Therefore,  the  infinite  horizon  average  value  formulation  of  the 
MDP  and  the  certainty-equivalent  problem  are  identical.  This 
establishes  the  first  statement. 


To  prove  the  second  statement,  we  note  that  under  Assump¬ 
tion  1,  E[h;[«i]  =  n\  -  £  +  1  +  tj  and  thus,  the  value 

function  is: 

n  e-\  -M2 

VN(nut)  =  ^  (wf(t{)  -  c(m  -  £+  l)t{  -  cAtc  ^  tj - ^-). 

r=i  j=i 

We  write  VN  =  Vone  +  yrem,  where  Fone  :  N  x  R  — >  R  and 
yrem  :  N  x  RY0  — >  R  are  defined  by 


yOne(«i,0)  =  wf(t{)  -  cn I f I , 

N 

yrem(«l,  t)  =  E  _  C(nl  -  ^  +  1  )k 


cAt2x 

~Y' 


Note  that  Vrem  is  a  decreasing  function  of  t\  and  from  Lemma  1 
we  know  that,  for  cni/w  >  gy,  yone  achieves  its  global  maxi¬ 
mum  at  fi  =  0.  Hence,  Vn  achieves  its  maximum  at  t\  =0  for 
cn\/w  >  gf,  that  is,  the  optimal  policy  drops  the  first  task  if 
«i  >  wgf/c.  Since,  n\  is  a  non-negative  integer,  n\  >  wgj/c  is 
equivalent  to  n\  >  Nmax. 

To  establish  the  last  statement,  we  note  that  the  function  yone  is 
a  decreasing  function  of  t\ ,  for  all  t\  >  f  '(c/w),  and  yrem  is  a 
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decreasing  function  of  t\ ,  for  all  t\  >  0.  Hence  the  maximum 
allocation  to  any  task  is  f\c/w).  O 


One  of  the  key  implications  of  Lemma  3  is  that  the  solution  of 
the  MDP  T  is  identical  to  the  solution  of  a  deterministic  dy¬ 
namic  optimization  problem.  Although  this  reduces  the  com¬ 
putational  complexity  significantly,  the  computational  cost  to 
determine  the  optimal  policy  still  grows  exponentially  with  the 
size  of  the  state  space  and  the  action  space.  We  now  exploit 
the  results  in  Lemma  3  to  reduce  the  dimensions  of  the  state 
space  and  action  space  of  the  MDP  I  .  We  construct  a  reduced 
MDP,  namely  Tred,  by  restricting  the  action  space  to  the  possi¬ 
ble  allocations  by  the  optimal  policy  and  by  aggregating  all  the 
queue  lengths  at  which  the  optimal  policy  allocates  zero  dura¬ 
tion  to  the  current  task  into  one  state  Nnrdx  +  1 .  Thus,  picking 
the  new  action  space  as  [0,  f\c/w)],  and  the  new  state  space  as 
{0, . . . ,  Amax  +  1).  The  new  transition  probabilities  for  allocating 
duration  t(  to  task  £  are  defined  by: 


pff 

nene+l 


0, 

-M{  (Mjpt+\-»t+n 

(Hf+,— Kf+1)!  ’ 
1  _  yNmax  Tp)^ 

1  ^j= o 


if  n.{+\  e  {0,  ...,«£  -  2), 

tf  ^£+1  ^  {^£  —  1,  -  •  •  ,  i^max}’ 

if  ne+ 1  =  A^max  +  1. 


The  reward  function  rk  :  NxR>o  — »  R  for  allocation  of  duration 
t(  to  task  £  is  defined  by  rfini,  t()  =  wf(t( )  -  cnkt{  -  cAt2/2.  We 
can  now  state  the  following  equivalence. 

Corollary  4  (Reducing  the  action  space  and  the  state  space). 
The  Markov  decision  processes  T  and  Tred  yield  the  same  opti¬ 
mal  policy. 


problem.  Under  Assumption  2,  we  treat  /  as  a  sigmoid  func¬ 
tion.  For  the  ease  of  notation,  we  denote  /  and  c/w  by  /  and 
c,  respectively.  We  now  introduce  the  following  finite  horizon 
optimization  problem  that  needs  to  be  solved  at  each  stage  in 
the  receding  horizon  framework: 

“1Ze  Jf  Zj  \f{t()  -  cE^n^  -  — )  (11) 

subject  to  ¥\ii(+]\ni\  =  max{0,  E[«£|«i]  -  1  +  /Iff}, 
where  /  =  {fi, . . . ,  W}  is  the  duration  allocation  vector. 

Under  Assumption  1,  the  constraint  in  the  optimization  prob¬ 
lem  (1 1)  yields: 


e-i 

E[«f|«i]  =  n\  —  £  +  \  +  A  ^  tj. 

j=  i 

Substituting  the  expected  queue  length  into  the  objective  func¬ 
tion  in  the  optimization  problem  (11),  one  obtains  the  function 
J  :  R>0  — >  R  defined  by 

1  N  N  cAt2 

J(t)  :=  77  X  (/(D  -c(n[-£+]  )t(  -  cAt{  ^  tj - ~"), 

f=i  i=hj*t 

where  c  is  the  expected  penalty  rate,  A  is  the  arrival  rate,  and  n  i 
is  the  initial  queue  length.  Thus,  the  optimization  problem  (11) 
is  equivalent  to 

maximize  /(f).  (12) 

t>  o 


Proof.  It  can  be  verified  that  the  value  function  for  the  two 
MDPs  is  the  same.  The  reduction  of  the  action  space  and  the 
state  space  follows  from  Lemma  3.  j| 


6.  Dynamic  queue  with  latency  penalty:  receding  horizon 
algorithm 

As  discussed  in  the  previous  section,  the  computation  of  the  op¬ 
timal  policy  for  infinite  horizon  average  cost  MDP  problem  (9) 
is  expensive  and  grows  exponentially  with  the  dimension  of  the 
state  space  and  action  space.  We  rely  on  the  receding  horizon 
framework  discussed  in  Section  2  to  develop  an  approximation 
algorithm  to  determine  the  solution  of  the  MDP  T  in  finite  time. 
As  discussed  in  Algorithm  1,  the  receding  horizon  framework 
solves  a  finite  horizon  optimization  problem  at  each  iteration. 
We  now  study  such  finite  horizon  optimization  problem  for  the 
certainty-equivalent  problem  (9). 

6.1.  Finite  horizon  optimization 

We  now  study  the  finite  horizon  optimization  problem  with 
horizon  length  N  that  the  receding  horizon  policy  solves  at  each 
iteration.  It  follows  from  Lemma  3  that  the  MDP  formulation 
is  identical  to  the  certainty-equivalent  problem.  Therefore,  we 
focus  on  the  solution  of  the  finite  horizon  certainty-equivalent 


In  the  remainder  of  Section  6.1,  we  propose  a  procedure  to 
determine  the  solution  of  the  optimization  problem  (11).  To 
develop  this  procedure,  we  study  some  properties  of  the  opti¬ 
mal  policy.  Assume  that  the  solution  to  the  optimization  prob¬ 
lem  (11)  allocates  a  strictly  positive  time  only  to  the  tasks  in 
the  set  Tproc  £  {1, . . . ,  N),  which  we  call  the  set  of  processed 
tasks.  (Accordingly,  the  policy  allocates  zero  time  to  the  tasks 
in  { 1 , . . . ,  N)  \  7~proc).  Without  loss  of  generality,  assume 

Tproc  ■“  Wl i  •  ■  ■  t  Twl’ 

where  ij\  <  ■  ■  ■  <  rjm  and  m  <  N.  A  duration  allocation  vector 
t  is  said  to  be  consistent  with  Tjlroc  if  only  the  tasks  in  Tpr0c  are 
allocated  non-zero  duration. 

Lemma  5  (Properties  of  maximum  points).  For  the  optimiza¬ 
tion  problem  (12),  and  a  set  of  processed  tasks  Tproc,  the  fol¬ 
lowing  statements  hold: 

(i) .  a  global  maximum  point  t*  satisfy  t*t  >  tf  >  . . .  >  t *m; 

(ii) .  a  local  maximum  point  V  consistent  with  'Tproc  satisfies 

m 

f\t\k)  =  c(n\  -  pk  +  1)  +  cA  ^  4,,  f°r  all  k  6  {1, ... ,  m}\ 

i=i 

(13) 

(iii) .  the  system  of  equations  (13)  can  be  reduced  to 

fflf  =  and  tlk  =  f  \f(tlm)  -  c(qk  -  771)), 
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for  each  k  e  {2 where  P  :  R>q  — >  lU{+oo|  is 
defined  by 


=  [pit),  if  fit )  >  c(i)m  -  771), 

1  +00,  otherwise, 

where  p(t)  =  cirp-rji  +  l+At+A'Zf^fif'itfciqk-qi))); 
(iv).  a  local  maximum  point  t*  consistent  with  7~proc  satisfies 

f"itni)  <  cA,  for  all  ke{l,...,m}. 


We  refer  to  the  function  P  as  the  effective  penalty  rate  for  the 
first  processed  task.  A  typical  graph  of  P  is  shown  in  Fig¬ 
ure  6(b).  Given  Tpmc,  a  feasible  allocation  to  the  task  i]\  is 
such  that  f(tm )  -  c{rjj  -  771)  >  0,  for  each  j  6  {2, ... ,  777).  For  a 
given  Tproc,  we  define  the  minimum  feasible  duration  allocated 
to  task  771  (see  Figure  6(a))  by 


jminjr  e  R>0  |  fit)  =  c(r]m  -  77O}, 

I0’ 


if  /' (fmf)  >  c{rjm  -  77O, 
otherwise. 


Proof  We  start  by  proving  the  first  statement.  Assume  t*  <  t*k 
and  define  the  allocation  vector  t  consistent  with  Tproc  by 


t* 

V 

if 

6{1... 

■  ,m)  \  { j,k }, 

t* 

if 

=  k, 

t*  , 

if 

=  j- 

It  is  easy  to  see  that 

J(f  )  -  Jit)  =  (77;  -  rjk)it;.  -  t;k)  <  0. 

This  inequality  contradicts  the  assumption  that  t*  is  a  global 
maximum  of  J. 

To  prove  the  second  statement,  note  that  a  local  maximum  is 
achieved  at  the  boundary  of  the  feasible  region  or  at  the  set 
where  the  Jacobian  of  J  is  zero.  At  the  boundary  of  the  feasible 
region  K>0,  some  of  the  allocations  are  zero.  Given  the  m  non¬ 
zero  allocations,  the  Jacobian  of  the  function  J  projected  on  the 
space  spanned  by  the  non-zero  allocations  must  be  zero.  The 
expressions  in  the  theorem  are  obtained  by  setting  the  Jacobian 
to  zero. 

To  prove  the  third  statement,  we  subtract  the  expression  in  equa¬ 
tion  ( 1 3)  for  k  =  j  from  the  expression  for  k  —  1  to  get 

/%)=/%)-c(T7;-77l).  (14) 

There  exists  a  solution  of  equation  (14)  if  and  only  \f  f'(t,h)  > 
cirjj  -  771).  If  fit,,  j)  <  c(?7  j  -  771)  +  /'( 0),  then  there  exists 
only  one  solution.  Otherwise,  there  exist  two  solutions.  It  can 
be  seen  that  if  there  exist  two  solutions  tj,  with  tj  <  ft,  then 
t~  <  t,h  <  t+j.  From  the  first  statement,  it  follows  that  only 
possible  allocation  is  ft.  Notice  that  f|  =  fifitm)  -  <7(77;  - 
?7i)).  This  choice  yields  feasible  time  allocation  to  each  task 
i)j,j  e  {2,  ...,777)  parametrized  by  the  time  allocation  to  the 
task  771 .  A  typical  allocation  is  shown  in  Figure  6(a).  We  further 
note  that  the  effective  penalty  rate  for  the  task  771  is  c(77i  -  771  + 
1)  +  cA  T,"Li  t,lr  Using  the  expression  of  tnj,j  e  {2,  ...,777), 
parametrized  by  f,;  ,  we  obtain  the  expression  for  V. 

To  prove  the  last  statement,  we  observe  that  the  Hessian  of  the 
function  J  is 

^  -  diag (/'%),  .  ■  .,/"(%,))  -  cAlmll, 

where  diag(-)  represents  a  diagonal  matrix  with  the  argument 
as  diagonal  entries.  For  a  local  maximum  to  exist  at  non-zero 
duration  allocations  {f^ , . . . ,  tnm\,  the  Hessian  must  be  negative 
semidefinite.  A  necessary  condition  for  Hessian  to  be  negative 
semidefinite  is  that  diagonal  entries  are  non-positive.  □ 


Let  /"ax  be  the  maximum  value  of  /".  We  now  define  the  points 
at  which  the  function  f"  —cA  changes  its  sign  (see  Figure  2(b)): 


(minjf  e  R>o  |  fit)  -  cA), 

I0’ 


if  cd  e  [/"(0),/"J, 
otherwise. 


f  max/  e  R>0  |  /"(f)  =  cA ),  if  cA  <  /"ax, 
1 0,  otherwise. 


Figure  6:  (a)  Feasible  allocations  to  the  second  processed  task  parametrized  by 
the  allocation  to  the  first  processed  task,  (b)  The  penalty  rate  and  the  sigmoid 
derivative  as  a  function  of  the  allocation  to  the  first  task. 

Theorem  6  (Finite  horizon  optimization).  Given  the  optimiza¬ 
tion  problem  (12),  and  a  set  of  processed  tasks  Tpmc.  The  fol¬ 
lowing  statements  are  equivalent: 

(i) .  there  exists  a  local  maximum  point  consistent  with  Tproc! 

(ii) .  one  of  the  following  conditions  hold 

f'(S2)  >  P(<h),  or  (15) 

fin)  <  Pf),  fiSi)  >  PiSi),  and  6!  >  r,.  (16) 

Proof  A  critical  allocation  to  task  q\  is  located  at  the  inter¬ 
section  of  the  graph  of  the  reward  rate  fitf  and  the  effec¬ 
tive  penalty  rate  Vitnf).  From  Lemma  5,  a  necessary  condi¬ 
tion  for  the  existence  of  a  local  maximum  at  a  critical  point  is 
f"itm)  <  cA,  which  holds  for  f m  e  ]0,  dj]  U  [^2,  °°[.  It  can 
be  seen  that  if  condition  (15)  holds,  then  the  function  f'itm) 
and  the  effective  penalty  function  P(t,h )  intersect  in  the  region 
[</,  oo[.  Similarly,  condition  (16)  ensures  the  intersection  of  the 
graph  of  the  reward  function  f'itm)  with  the  effective  penalty 
function  P(t,h )  in  the  region  ]0,  <5i],  □ 

We  now  provide  a  procedure  to  determine  the  solution  to  the  op¬ 
timization  problem  (12).  Given  a  sequence  of  zero  and  non-zero 
allocations  £  6  {0,  +)'v,  we  denote  the  corresponding  critical  al¬ 
location  for  maximum  by  t(£).  The  details  of  the  procedure  are 
shown  in  Algorithm  2.  We  refer  to  the  policy  obtained  from 
receding  horizon  algorithm  that  solves  the  optimization  prob¬ 
lem  (12)  at  each  stage  as  the  certainty-equivalent  policy. 
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Algorithm  2  Finite  horizon  allocation  algorithm 
1:  given  m,  N,  c,  A 
2:  k  :=  0;  Jl 

3:  for  each  string  f  6  {0, +}N 

4:  set  Tproc  :=  {i  g  {1, . . .  ,N]  |  £  =  +1 

5:  if  condition  (15)  or  (16) 

6:  then  determine  critical  allocations  for  maximum  t]h 

via  bisection  algorithm 

7:  determine  allocations 

4  =  /+(/'(6 7i)  “  c(rlj  ~  7/i))’ ;  e  {2, . . . , /n) 

8:  determine  expected  queue  length  E[«f ] ,  £  G  { 1 , . . . ,  N) 

9:  if E[^]  >0,Wg{1,...,A} 

10:  then  Jl  -  3K  u{/t(0} 

II:  optimal  allocation  t*  =  argmaxre^7(f) 


(i) .  the  average  value  function  satisfy  the  following  upper 

bound 

Favg(«i ,  t)  <  wf(f(c/w))  -  cf(c/w), 

for  each  iij  G  N  and  any  non-negative  sequence  t; 

(ii) .  the  average  value  function  satisfy  the  following  lower 

bound  for  any  certainty-equivalent  policy: 

VavgM  <“)  > 

w/(Tmax)  -  CTmax  -  C-^L,  if  0  <  /l  <  Y~, 

fj^(wf(Tmm)  -  qfrm ax  -  otherwise, 

for  each  n\  G  N,  where  rmax  =  p(cjw)  and  rmjn  =  Piqf). 


Remark  4  (Computational  complexity  of  Algorithm  2).  In  the 
worst  case.  Algorithm  2  requires  a  comparison  of  the  solution 
of  2n-  1  optimization  problems.  Although  the  number  of  worst- 
case  comparisons  grows  exponentially  with  the  chosen  horizon 
length  N,  it  remains  reasonable  for  fairly  large  horizon  lengths 
(N  <  10).  □ 

Remark  5  (Comparison  with  a  concave  utility).  With  the 
increasing  penalty  rate  as  well  as  the  increasing  arrival  rate, 
the  time  duration  allocation  decreases  to  a  critical  value  and 
then  jumps  down  to  zero,  for  the  dynamic  queue  with  latency 
penalty.  In  contrast,  if  the  performance  function  is  concave  in¬ 
stead  of  sigmoid,  then  the  duration  allocation  decreases  contin¬ 
uously  to  zero  with  increasing  penalty  rate  as  well  as  increasing 
arrival  rate.  □ 

6.2.  Performance  of  receding  horizon  algorithm 

We  now  derive  performance  bounds  on  the  certainty-equivalent 
policy.  First,  we  determine  a  global  upper  bound  on  the  per¬ 
formance  of  any  policy  for  the  MDP  I  .  Then,  we  develop  a 
lower  bound  on  the  performance  of  the  unit  horizon  certainty- 
equivalent  policy,  that  is,  the  policy  obtained  from  the  receding 
horizon  algorithm  that  solves  optimization  problem  (11)  with 
horizon  length  N  =  1  at  each  iteration.  The  performance  of 
the  unit  horizon  certainty-equivalent  policy  provides  a  lower 
bound  to  the  performance  of  any  certainty-equivalent  policy 
that  solves  a  finite  horizon  problem  with  horizon  length  N  >  1 
at  each  stage.  Let  frec  be  the  sequence  of  duration  allocations 
under  a  certainty-equivalent  policy.  Without  loss  of  generality, 
we  assume  that  the  initial  queue  length  is  unity.  If  the  initial 
queue  length  is  non-unity,  then  we  drop  tasks  till  queue  length 
is  unity.  Note  that  this  does  not  affect  the  infinite  horizon  aver¬ 
age  value  function.  We  also  assume  that  the  latency  penalty  is 
small  enough  to  ensure  an  optimal  non-zero  duration  allocation 
if  only  one  task  is  present  in  the  queue,  that  is,  c  <  wgy.  We 
now  derive  a  lower  bound  on  the  performance  of  the  unit  hori¬ 
zon  certainty-equivalent  policy,  which  is  also  a  lower  bound  on 
the  performance  of  any  certainty-equivalent  policy. 

Theorem  7  (Bounds  on  performance).  For  the  Markov  Deci¬ 
sion  Process  L  and  any  certainty-equivalent  policy  the  follow¬ 
ing  statements  hold,  provided  c  <  wgf.m 


Proof.  We  start  by  establishing  the  first  statement.  We  recall 
from  Lemma  3  that  the  value  function  Vavg  is  identical  to  the 
objective  function  of  the  certainty-equivalent  problem  (9),  that 
is, 


1  x  ^ 

Vavg(nut)  =  lim  —  V  wf(t{)  -  cE[n(\ni]t{  -  cAt2e/2 

N — >+oo  /V 

£=  1 

l  N 

<  lim  -  V  wf(t()  -  cte 

N—>+ oo  TV 

^  wf{f\clw))  -  cf(c/w), 

where  the  last  inequality  follows  from  Lemma  1 . 

In  order  to  determine  a  lower  bound,  we  construct  following 
allocation  policy: 

flow  _  I  T’max;  0  <  A  <  1/t  max  7 

1  ?|tat,  otherwise, 

for  each  £  G  N,  where  tfdt  G  argmax{vv/(y8)  -  ch{/3  |  f  G 
{0, P(h(c/w)}}  and  hi  =  E[«f|«i].  We  note  that  the  unit 
horizon  certainty-equivalent  policy  allocates  duration  f“nlt  G 
argmax{M>/(f)-cnft-c/l/2/2  [  t  e  R>o)  to  task  £  G  N.  Therefore, 

w/(f“nit)  -  chitp  -  cAtft2 12  >  vi>/(4ow)  -  chet'P  -  cAt1^2  / 2 
=>  VavgM*UMt)>  FavgM<l0W)- 


We  first  consider  the  case  when  0  <  A  <  l/rmax.  The  con¬ 
structed  policy  allocates  duration  rmax  to  each  task.  For  the 
certainty-equivalent  problem,  a  new  task  arrives  in  time  l/A  > 
Tmax.  that  is,  after  servicing  the  current  task,  the  queue  is  either 
empty  or  has  one  task.  Therefore,  the  expected  reward  for  each 
task  is  w/(rmax)  -  crmax  -  cApJ  2. 

In  the  second  case,  we  note  that  the  maximum  allocation  to  each 
task  under  the  constructed  policy  is  rmax  and  hence,  the  maxi¬ 
mum  number  of  expected  arrivals  while  processing  current  task 
is  drmax.  In  the  worst  possible  case,  f/iTmaxl  -  1  tasks  would  be 
dropped  before  next  task  is  served.  Further,  the  duration  allo¬ 
cation  to  the  task  is  in  the  interval  [rmin,  rmax]  and  the  penalty 
c  <  wgf.  Thus,  the  lower  bound  follows.  □ 
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We  now  elucidate  on  the  concepts  discussed  in  this  section  with 
an  example. 

Example  3  (Certainty-equivalent  policy).  Suppose  that  the  hu¬ 
man  operator  has  to  serve  a  queue  of  tasks  with  Poisson  ar¬ 
rival  at  the  rate  A  per  sec.  The  set  of  the  tasks  is  the  same  as 
in  Example  2  and  each  task  is  sampled  uniformly  from  this 
set.  For  this  set  of  data,  the  average  performance  function 
is  f(t)  =  w/(  1  +  e~at+b),  where  w  =  6.4,  a  =  1.0853,  and 
b  =  4.3027.  The  average  penalty  rate  is  c  -  0.1380  per  sec¬ 
ond.  The  certainty-equivalent  policy  that  solves  problem  (11) 
with  horizon  length  N  =  10  at  each  stage  is  shown  in  Figure  7. 
It  can  be  seen  that  the  certainty-equivalent  policy  drops  more 
tasks  at  higher  arrival  rates  and  tries  to  maintain  a  single  task  in 
the  queue.  The  performance  of  the  certainty-equivalent  policy 
along  with  the  global  upper  bound  on  the  performance  of  any 
policy  and  the  lower  bound  on  the  performance  of  any  certainty- 
equivalent  policy  is  shown  in  Figure  8.  As  expected,  for  the 
low  arrival  rates  the  certainty-equivalent  policy  achieves  a  per¬ 
formance  very  close  to  the  global  upper  bound.  □ 


=  6f  .  ,  ti  tt  t  t  ft2 


(a)  Low  arrival  rate 
6f  T  T  T  T  -s  3r 

•2  5  2  .  f  T  T 

B-IJJ-.IJJ-.LL  ijillillillillillillilmli 

0  5  10  15  20  25  0  5  10  15  20  25 

Task  Task 

(b)  Moderate  arrival  rate 


c5f  -f,5 


(c)  High  arrival  rate 


Figure  7:  Certainty-equivalent  policy.  An  optimization  problem  with  horizon 
length  N  =  10  is  solved  at  each  stage.  The  arrival  rates  for  the  three  scenarios 
are  A  =  0.25, 0.5  and  1,  respectively. 


Arrival  Rate 


Figure  8:  Bounds  on  performance.  The  solid  red  curve  represents  the  average 
value  function  under  certainty-equivalent  policy,  the  dashed-dotted  black  line 
represents  the  upper  bound  on  any  policy  and  the  dashed  green  curve  represents 
the  lower  bound  on  any  certainty-equivalent  policy. 

Discussion  8  (Optimal  arrival  rate).  The  performance  of  the 
certainty-equivalent  policy  as  a  function  of  the  arrival  rate  is 
shown  in  Figure  9.  It  can  be  seen  that  the  expected  benefit  per 
unit  task,  that  is,  the  value  of  the  average  value  function  un¬ 
der  the  certainty-equivalent  policy,  decreases  slowly  till  a  crit¬ 


ical  arrival  rate  and  then  starts  decreasing  quickly.  This  criti¬ 
cal  arrival  rate  corresponds  to  the  situation  where  a  new  task  is 
expected  to  arrive  as  soon  as  the  operator  finishes  processing 
the  current  task.  For  the  set  of  data  considered,  the  benefit  per 
unit  time  achieves  its  maximum  at  this  critical  arrival  rate.  In 
general,  it  is  not  true  and  this  maximum  may  be  achieved  at  a 
value  higher  than  the  critical  arrival  rate.  Thus,  the  arrival  rate 
maximizing  benefit  per  unit  time  may  result  in  poor  average  de¬ 
cision  quality  on  each  task.  The  objective  of  the  designer  is  to 
achieve  a  good  performance  on  each  task  and  therefore,  the  ar¬ 
rival  rate  should  be  picked  close  to  the  critical  arrival  rate.  It 
can  be  verified  that  the  critical  arrival  rate  is  /lcr;t  =  l/f\2c/w). 
In  general,  there  may  be  other  performance  goals  for  the  opera¬ 
tor,  and  accordingly,  higher  task  arrival  rate  for  the  queue  could 
be  designed.  □ 


Figure  9:  Expected  benefit  per  unit  task  and  per  unit  time  over  a  finite  hori¬ 
zon  under  certainty-equivalent  policy.  The  dashed-dotted  black,  solid  red  and 
dashed  green  curves  correspond  to  latency  penalties  0.01,  0.025,  and  0.05,  re¬ 
spectively. 


7.  Dynamic  queue  with  latency  penalty:  receding  horizon 
algorithm  with  real  time  information 

We  studied  the  receding  horizon  policies  for  the  certainty- 
equivalent  problem  which  is  identical  to  infinite  horizon  av¬ 
erage  cost  formulation  of  the  underlying  MDP.  While  design¬ 
ing  the  decision  making  queue,  the  true  realization  of  the  tasks 
and  the  associated  latency  penalty  and  importance  is  not  known. 
Therefore,  the  policy  is  designed  for  the  expected  evolution  of 
the  queue.  In  particular,  the  computation  of  the  value  func¬ 
tion  in  equation  (8)  involved  the  expectation  over  realizations 
of  the  queue.  In  real  time,  the  information  about  the  nature 
of  the  current  tasks  in  the  queue  is  available  and  should  be 
incorporated  in  the  value  function.  We  incorporate  this  infor¬ 
mation  in  the  following  way.  We  define  new  value  function 
VfA  :  R”  x  R”  x  R“  x  R*  ->  1  by 

N 

V^\d,  w,  C,f)  =  2  Efa(<fc  c,  wdn  ttWe ■], 
t=  1 

where  R“„  represents  sequences  of  positive  real  numbers,  Tt 
represents  the  sigma  algebra  containing  all  the  information 
available  when  task  {  is  processed,  d,w,  and  C  are  the  se¬ 
quences  of  realized  difficulty  levels,  weights,  and  latency  penal¬ 
ties,  respectively. 

With  the  real  time  information,  the  infinite  horizon  average 
value  function  of  the  MDP  V^zgd  :  R“0  x  R“0  x  R“0  x  R>0  — >  R 
is  defined  by 

V%*{d,w,C,t)=  lim  Avf\d,wX,t). 

6  N—*+ oo  TV 
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In  the  spirit  of  Section  6,  we  develop  receding  horizon  algo¬ 
rithms  to  maximize  VT? f„d.  We  solve  the  associated  finite  horizon 
problem  using  dynamic  programming  with  discretized  action 
and  state  space. 

Remark  6  (Finite  horizon  problem).  It  can  be  verified  that 
the  finite  horizon  problem  associated  with  the  maximization 
of  is  similar  to  the  optimization  problem  (11),  but  due 
to  the  non-identical  nature  of  the  tasks,  the  allocations  to  the 
processed  tasks  can  not  be  parametrized  as  a  function  of  the  al¬ 
location  to  the  first  processed  task  (see  Lemma  5).  Thus,  the 
search  for  the  optimal  allocation  can  not  be  reduced  to  a  one  di¬ 
mensional  search.  This  makes  the  extension  of  the  techniques 
in  Section  6  to  the  maximization  of  V7'!"1  intractable.  Therefore, 
we  utilize  dynamic  programming  with  discretized  action  and 
state  space  to  approximately  solve  the  finite  horizon  problem. □ 

Before  we  present  the  receding  horizon  algorithm,  we  introduce 
few  notations.  An  analogous  argument  to  the  one  in  Lemma  3 
shows  that  under  optimal  policy  the  maximum  allocation  to  a 
sigmoid  function  /  with  latency  penalty  c  and  weight  w  is  upper 
bounded  by  fd(c/w).  We  define  the  maximum  allocation  to  any 
sigmoid  function  by  6max  =  sup{/j(c™n/w™ax)  |  d  e  £)}.  Given 
horizon  length  N,  current  queue  length  «/  <  N,  the  realiza¬ 
tion  of  the  sigmoid  functions  fa,...  ,/„f,  the  associated  latency 

penalties  Ci . cnt  and  importance  w i,. . .  ,  w„f,  we  define  the 

reward  associated  with  task  j  e  {1, . . . ,  N)  by 


evolution  of  the  queue  at  an  arrival  rate  A  -  0.5  per  second  are 
shown  in  Figure  10  and  11,  respectively.  The  adaptive  policy 
tends  to  drop  the  tasks  that  are  difficult  and  unimportant.  The 
difficulty  of  the  tasks  is  characterized  by  the  inflection  point 
of  the  associated  sigmoid  functions.  Due  to  the  heterogeneous 
nature  of  the  tasks,  the  queue  length  under  the  adaptive  pol¬ 
icy  is  larger  than  the  queue  length  under  certainty-equivalent 
policy.  The  queue  length  under  the  adaptive  allocation  pol¬ 
icy  with  horizon  length  1  is  higher  than  the  adaptive  allocation 
policy  with  horizon  length  10.  A  comparison  of  the  certainty- 
equivalent  policy  and  the  adaptive  allocation  policies  is  shown 
in  Figure  12.  We  obtained  these  performance  curves  through 
Monte-Carlo  simulations.  It  can  be  seen  that  the  adaptive  allo¬ 
cation  policy  improves  the  performance  significantly  over  the 
certainty-equivalent  policy.  Interestingly,  the  performance  of 
the  adaptive  allocation  policy  with  horizon  length  N  -  1  is  also 
better  than  the  certainty-equivalent  policy.  Thus,  incorporat¬ 
ing  the  available  information  significantly  improves  the  perfor¬ 
mance.  □ 


jrfd,  if  1  <  j  <  ri(, 
jr“p,  if  ne  +  \<j<  N, 


(17) 


Tflllllll! 


Task 


where  Fjlzd  =  wjffatj)  -  (E-Lq  +  (E[n,]  -  ne  -  j  +  1  )c)tj  - 
cAt2/2,  and  7\xp  =  wf(tj)  -  c(ri{  -  j  +  1  )tj  -  cAt2/2.  We  now 
formally  introduce  this  dynamic  programming  based  algorithm 
in  Algorithm  3,  and  refer  to  it  as  adaptive  allocation  algorithm. 
This  algorithm  incorporates  the  precise  information  of  the  tasks 
currently  waiting  in  the  queue  while  processing  each  task  and 
thus  adapts  the  allocation  policy  as  new  information  becomes 
available.  We  will  now  provide  numerical  evidence  to  show 
that  adaptive  allocation  policy  improves  the  performance  over 
the  policies  discussed  in  Section  6. 


Algorithm  3  Adaptive  Allocation  Algorithm 
1:  Given:  fa,  d  e  T),  horizon  length  N,  arrival  rate  A,  set  l  -  1 
2:  For  task  l  determine  queue  length  n(,  sigmoid  functions 
and  penalty  rates  fa,  Cj  for  each  task  i  e  {1, . . .  ,nfa 
3:  if  ll{  <  N 

4:  set  stage  rewards  rj  using  equation  (17),  V/  e  { 1 , . . . , N], 

5:  else  set  stage  rewards,  for  each  j  e  { 1 , . . . ,  N), 

ri  =  Wjfjitj)  -  CZtj  c‘  +  (Et« 7]  -  nt  ~  j  +  1  )c)tj  ~  cAt2/2. 
6:  solve  the  finite  horizon  DP  with  appropriately  discretized 
allocations  tj  6  [0,^max],  for  each  j  e  {1, . . . ,  N} 
7:  allocate  duration  t\  to  the  task  C 
8:  set  {  =  £  +  1  and  go  to  step  2 : 

Example  4  (Adaptive  allocation  policy).  For  the  data  in  Ex¬ 
ample  3,  we  now  study  the  adaptive  allocation  policy.  Adaptive 
allocation  policies  with  horizon  length  1  and  10  for  a  sample 


Figure  10:  Adaptive  policy  for  a  sample  evolution  of  the  dynamic  queue  with 
latency  penalty.  An  optimization  problem  with  horizon  length  N  =  10  is  solved 
at  each  stage. 


Figure  1 1 :  Adaptive  policy  for  a  sample  evolution  of  the  dynamic  queue  with 
latency  penalty.  An  optimization  problem  with  horizon  length  N  =  1  is  solved 
at  each  stage. 


Figure  12:  Empirical  expected  benefit  per  unit  task  and  per  unit  time.  The 
dashed-dotted  black  curve  represents  the  adaptive  allocation  policy  with  hori¬ 
zon  length  10,  the  solid  red  curve  represents  the  adaptive  allocation  policy  with 
horizon  length  1 ,  and  the  dashed  green  curve  represents  the  certainty-equivalent 
policy  with  horizon  length  10,  respectively. 
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8.  Conclusions 


We  presented  optimal  servicing  policies  for  the  queues  where 
the  performance  function  of  the  server  is  a  sigmoid  function. 
First,  we  considered  a  queue  with  no  arrival  and  a  latency 
penalty.  It  was  observed  that  the  optimal  policy  may  drop  some 
tasks.  Further,  for  identical  tasks,  the  duration  allocation  to 
the  task  increases  with  the  decreasing  queue  length.  Second, 
a  dynamic  queue  with  latency  penalty  was  considered.  We  first 
studied  the  scenario  where  no  real  time  information  about  the 
evolution  of  the  queue  was  available.  This  models  the  situation 
of  the  designer  who  has  no  information  about  the  true  realiza¬ 
tion  of  queue  at  her  disposal.  A  receding  horizon  algorithm  was 
established  for  the  certainty-equivalent  problem  and  guidelines 
for  choosing  the  arrival  rate  were  suggested.  We  then  studied 
the  scenario  where  real  time  information  about  the  realization 
of  the  queue  was  available.  An  adaptive  allocation  algorithm 
that  incorporated  all  the  available  information  about  the  current 
tasks  into  the  allocation  policy  was  developed.  A  comparison 
of  the  certainty-equivalent  policy  and  the  adaptive  allocation 
policy  was  presented. 

The  decision  support  system  designed  in  this  paper  assumes 
that  the  arrival  rate  of  the  tasks  as  well  as  the  parameters  in  the 
performance  function  are  known.  An  interesting  open  problem 
is  to  come  up  with  policies  which  perform  an  online  estimation 
of  the  arrival  rate  and  the  parameters  of  the  performance  func¬ 
tion  and  simultaneously  determine  the  optimal  allocation  pol¬ 
icy.  Another  interesting  problem  is  to  incorporate  more  human 
factors  into  the  optimal  policy,  for  example,  situational  aware¬ 
ness,  fatigue,  etc.  The  policies  designed  in  this  paper  rely  on 
first-come  first-serve  discipline  to  process  tasks.  It  would  be  of 
interest  to  study  problems  with  other  processing  disciples,  for 
example,  preemptive  queues. 
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